QC and filter seurat object¶

In [1]:
library_load <- suppressMessages(
    
    list(
        
        # Seurat 
        library(Seurat), 
        
        # Scran doubletFinder implementation 
        library(scDblFinder), 
        
        # FastMNN implementation 
        library(SeuratWrappers), 
        
        # Data 
        library(tidyverse), 
        
        # Reticulate
        library(reticulate), 
        
        # Biomart 
        library(biomaRt), 
        
        # Plotting 
        library(ggplot2), 
        library(knitr),
        library(kableExtra),
        library(IRdisplay)
        
    )
)
In [2]:
random_seed <- 42
set.seed(random_seed)
In [3]:
options(warn=-1)
In [4]:
# Set working directory to project root
setwd("/research/peer/fdeckert/FD20200109SPLENO")
In [5]:
# Source files
source("plotting_global.R")
source("bin/seurat_qc.R")
source("bin/seurat_dea.R")

Parameter settings¶

In [6]:
# Files 
so_raw_file <- "data/object/raw.rds"

so_qc_rds <- "data/object/qc.rds"
so_qc_h5ad <- "data/object/qc.h5ad"

# Plotting Theme
ggplot2::theme_set(theme_global_set()) # From project global source()

Import Seurat object¶

In [7]:
so_raw <- readRDS(so_raw_file)
In [8]:
so_raw$tissue <- factor(so_raw$tissue, levels=c("Myeloid", "Progenitor"))
so_raw$treatment <- factor(so_raw$treatment, levels=c("NaCl", "CpG"))
In [9]:
GetAssayData(so_raw, assay="RNA", slot="data")["Bmp4", ] %>% sum
24.7153370890719

Rank plot¶

Empty droplets were determined with CellRanger V3.0.2 Lun et al., 2019 EmptyDrop heuristic. RNAse activity of granulocytes might be wrongly identified as empty cells by CellRanger.

Typical Sample A steep drop-off is indicative of good separation between the cell-associated barcodes and the barcodes associated with empty GEMs. A ideal barcode rank plot has a distincitve shape, which is referred to as a "cliff and knee".
Heterogeneous Sample Heterogeneous populations of cells in a sample result in two "cliff and knee" distributions. However, there should still be clear separation between the bacodes.
Compromised Sample Round curve and lack of steep cliff may indicate low sample quality or loss of single-cell behavior. This can be due to a wetting failure, premature cell lysis, or low cell viability.
Compromised Sample Defined cliff and knee, but the total number of barcodes detected may be lower than expected. This can be caused by a sample clog or inaccurate cell count.

In [10]:
options(repr.plot.width=10, repr.plot.height=5)
rank_plot_qc_1 <- rank_plot_qc(so_raw, color_color=color$cellranger_class, formular=tissue~sample_rep+treatment)
rank_plot_qc_1

Filter by CellRanger class¶

In [11]:
so_qc <- subset(so_raw, subset=cellranger_class=="Cell")

Compute cell cycle score¶

In [12]:
# Get Seurat cell cycle genes 
cc_genes_seurat_s <- cc.genes.updated.2019$s.genes
cc_genes_seurat_g2m <- cc.genes.updated.2019$g2m.genes

# Get mouse orthologs from human gene simbols
httr::set_config(httr::config(ssl_verifypeer=FALSE))

hgnc_mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl", host="https://dec2021.archive.ensembl.org/")
mm_mart <- useMart("ensembl", dataset="mmusculus_gene_ensembl", host="https://dec2021.archive.ensembl.org/")

cc_genes_seurat_s <- getLDS(attributes=c("hgnc_symbol"), filters="hgnc_symbol", values=cc_genes_seurat_s, mart=hgnc_mart, attributesL=c("mgi_symbol"), martL=mm_mart, uniqueRows=TRUE)
cc_genes_seurat_s <- cc_genes_seurat_s[, 2]
    
cc_genes_seurat_g2m <- getLDS(attributes=c("hgnc_symbol"), filters="hgnc_symbol", values=cc_genes_seurat_g2m, mart=hgnc_mart, attributesL=c("mgi_symbol"), martL=mm_mart, uniqueRows=TRUE)
cc_genes_seurat_g2m <- cc_genes_seurat_g2m[, 2]

# Save cell cycle genes 
saveRDS(list(s=cc_genes_seurat_s, g2m=cc_genes_seurat_g2m), "data/annotation/cell_cycle_seurat/cc_genes.rds")

# Compute cell cycle score 
so_qc <- CellCycleScoring(so_qc, s.features=cc_genes_seurat_s, g2m.features=cc_genes_seurat_g2m, set.ident=FALSE)

colnames(so_qc@meta.data) <- gsub("Phase", "cc_phase_class", colnames(so_qc@meta.data))
colnames(so_qc@meta.data) <- gsub("S.Score", "msS_RNA", colnames(so_qc@meta.data))
colnames(so_qc@meta.data) <- gsub("G2M.Score", "msG2M_RNA", colnames(so_qc@meta.data)) 
    
so_qc$msCC_diff_RNA <- so_qc$msS_RNA - so_qc$msG2M_RNA
so_qc$cc_phase_class <- factor(so_qc$cc_phase_class, levels=names(color$cc_phase_class))

SingleR annotation¶

SingleR identifies marker genes from the reference and uses them to compute assignment score (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest doublet_stat is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.

first.labels: Labels before fine-tuning
labels: Labels after fine-tuning
pruning: labels after pruning

ImmGen¶

In [13]:
if(FALSE) {
    
    # Load reference data 
    ref <- ImmGenData()
    
    # Seurat object to SingleCellExperiment
    sce <- SingleCellExperiment(list(counts=so_qc@assays$RNA@counts))

    # # Predict labels
    label_main <- SingleR::SingleR(test=sce, ref=ref, labels=ref$label.main, assay.type.test="counts", de.method="classic")
    label_fine <- SingleR::SingleR(test=sce, ref=ref, labels=ref$label.fine, assay.type.test="counts", de.method="classic")

    saveRDS(label_main, "result/singler/label_main_immgen.rds")
    saveRDS(label_fine, "result/singler/label_fine_immgen.rds")

} else {
    
    label_main <- readRDS("result/singler/label_main_immgen.rds")
    label_fine <- readRDS("result/singler/label_fine_immgen.rds")
    
}

# Add labels to Seurat object
label_main_meta <- as.data.frame(label_main) %>% dplyr::select(pruned.labels, tuning.scores.first) %>% dplyr::rename(label_main_immgen=pruned.labels, delta_score_main_immgen=tuning.scores.first)
label_fine_meta <- as.data.frame(label_fine) %>% dplyr::select(pruned.labels, tuning.scores.first) %>% dplyr::rename(label_fine_immgen=pruned.labels, delta_score_fine_immgen=tuning.scores.first)

so_qc <- AddMetaData(so_qc, cbind(label_main_meta, label_fine_meta))

Haemosphere¶

In [14]:
if(FALSE) {
    
    # Load reference data 
    ref <- readRDS("data/haemosphere/se_haemosphere.rds")
    
    # Seurat object to SingleCellExperiment
    sce <- SingleCellExperiment(list(counts=so_qc@assays$RNA@counts))

    # # Predict labels
    label_main <- SingleR::SingleR(test=sce, ref=ref, labels=ref$label.main, assay.type.test="counts", de.method="classic")
    label_fine <- SingleR::SingleR(test=sce, ref=ref, labels=ref$label.fine, assay.type.test="counts", de.method="classic")

    saveRDS(label_main, "result/singler/label_main_haemosphere.rds")
    saveRDS(label_fine, "result/singler/label_fine_haemosphere.rds")

} else {
    
    label_main <- readRDS("result/singler/label_main_haemosphere.rds")
    label_fine <- readRDS("result/singler/label_fine_haemosphere.rds")
    
}

# Add labels to Seurat object
label_main_meta <- as.data.frame(label_main) %>% dplyr::select(pruned.labels, tuning.scores.first) %>% dplyr::rename(label_main_haemosphere=pruned.labels, delta_score_main_haemosphere=tuning.scores.first)
label_fine_meta <- as.data.frame(label_fine) %>% dplyr::select(pruned.labels, tuning.scores.first) %>% dplyr::rename(label_fine_haemosphere=pruned.labels, delta_score_fine_haemosphere=tuning.scores.first)

so_qc <- AddMetaData(so_qc, cbind(label_main_meta, label_fine_meta))

Doublet detection on pre-filtered cells¶

I first tried the strategy from Sala et al., 2019 which was also used in Barile et al., 2021 with the source documented here. We only do the refined method with combined samples. The functions used in the script are documented here. We do doublet detection on sample groups (Myeloid and Progenitor cells of a replicated) and integrated those groups per batch and then by treatment with FastMNN. However, I default later back to just use the standard approach.

In [15]:
qc_class_set <- function(so) {
    
    print(so)
    
    # QC filter for all cells
    so$pMt_RNA_min <- -Inf
    so$pMt_RNA_max <- 5
    
    so$nCount_RNA_min <- 1500
    so$nCount_RNA_max <- max(so$nCount_RNA)

    so$qc_class <- ifelse(
        so$cellranger_class=="Cell" & 
        so$nCount_RNA <= so$nCount_RNA_max & 
        so$nCount_RNA > so$nCount_RNA_min &
        so$pMt_RNA <= so$pMt_RNA_max &
        so$pMt_RNA > so$pMt_RNA_min,
        "pass", "fail"
      )
    
    
    so <- subset(so, subset=qc_class=="pass")
    
    print(so)
    
    return(so)

}
In [16]:
so_db <- Seurat::SplitObject(so_qc, split.by="sample_group")
so_db <- lapply(so_db, qc_class_set)
An object of class Seurat 
14772 features across 8440 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 5381 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 11470 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 6730 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 6727 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 3290 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 10947 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)
An object of class Seurat 
14772 features across 4567 samples within 1 assay 
Active assay: RNA (14772 features, 0 variable features)

Run scDblFinder on pre-filtered data¶

In [17]:
# Compute doublet density per sample group 
so_db <- lapply(so_db, function(so) {
    
    DefaultAssay(so) <- "RNA"
    
    # Remove empty genes from split 
    cnt <- GetAssayData(so, assay="RNA", slot="counts")
    cnt <- cnt[rowSums(cnt) > 0, ]
    
    # Compute doublet density 
    sce <- scDblFinder::scDblFinder(cnt)
    so$doublet_score <- sce$scDblFinder.score
    so$doublet_score_log2 <- log2(sce$scDblFinder.score)
    so$doublet_class <- sce$scDblFinder.class
    
    return(so)
    
}
               )
Assuming the input to be a matrix of counts or expected counts.

Creating ~5000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 405 cells excluded from training.

iter=1, 468 cells excluded from training.

iter=2, 481 cells excluded from training.

Threshold found:0.481

406 (7.5%) doublets called

Assuming the input to be a matrix of counts or expected counts.

Creating ~5000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 604 cells excluded from training.

iter=1, 493 cells excluded from training.

iter=2, 448 cells excluded from training.

Threshold found:0.506

396 (5.9%) doublets called

Assuming the input to be a matrix of counts or expected counts.

Creating ~5000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 190 cells excluded from training.

iter=1, 233 cells excluded from training.

iter=2, 236 cells excluded from training.

Threshold found:0.74

159 (4.8%) doublets called

Assuming the input to be a matrix of counts or expected counts.

Creating ~5000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 322 cells excluded from training.

iter=1, 287 cells excluded from training.

iter=2, 274 cells excluded from training.

Threshold found:0.504

228 (5%) doublets called

In [18]:
options(repr.plot.width=30, repr.plot.height=5)

# Compute sub-cluster per sample group 
so_db <- lapply(so_db, function(so) {
    
    # Cluster function     
    so <- ScaleData(so, features=rownames(so), assay="RNA", verbose=FALSE)
    so <- FindVariableFeatures(so)
    so <- RunPCA(so, npcs=50, features=VariableFeatures(so), assay="RNA", verbose=FALSE)
    so <- FindNeighbors(so, dims=1:50, k.param=15, reduction="pca", verbose=FALSE)
    so <- FindClusters(so, verbose=FALSE, resolution=1, algorithm=1)
    so <- RunUMAP(so, dims=1:50, n.neighbors=30, verbose=FALSE)
    
    dplot_1 <- dplot(so, reduction="umap", group_by="treatment") + scale_color_manual(values=color$treatment) + ggtitle("Treatment")
    dplot_2 <- dplot(so, reduction="umap", group_by="tissue") + scale_color_manual(values=color$tissue) + ggtitle("Tissue")
    dplot_3 <- dplot(so, reduction="umap", group_by="doublet_class") + ggtitle("doublet_class")
    fplot_1 <- fplot(so, reduction="umap", features="doublet_score") + scale_colour_viridis_c(option="inferno") + ggtitle("doublet_score")
    fplot_2 <- fplot(so, reduction="umap", features="doublet_score_log2") + scale_colour_viridis_c(option="inferno") + ggtitle("doublet_score_log2")
    
    dplot_comb <- dplot_1 + dplot_2 + dplot_3 + fplot_1 + fplot_2 + plot_layout(ncol=5)
    plot(dplot_comb)
    
    return(so)
    
}
               )
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Add scDblFinder results to unfiltered Seurat object¶

In [19]:
so_db <- merge(so_db[[1]], so_db[2:length(so_db)])
so_qc <- AddMetaData(so_qc, dplyr::select(so_db@meta.data, doublet_class, doublet_score, doublet_score_log2))

Set QC filter¶

In [20]:
qc_class_set <- function(so) {
    
    # QC filter for all cells
    so$pMt_RNA_min <- -Inf
    so$pMt_RNA_max <- 5
    
    so$nCount_RNA_min <- 1500
    so$nCount_RNA_max <- max(so$nCount_RNA)

    so$qc_class <- ifelse(
        so$cellranger_class == "Cell" & 
        so$nCount_RNA <= so$nCount_RNA_max & 
        so$nCount_RNA > so$nCount_RNA_min &
        so$pMt_RNA <= so$pMt_RNA_max &
        so$pMt_RNA > so$pMt_RNA_min & 
        so$doublet_class == "singlet", 
        "pass", "fail"
      )
    
    # Remove Ery from Myeloid sort before computing nFeature_RNA
    if(so$tissue[1]=="Myeloid") {
        
        so$qc_class <- ifelse(
            so$label_main_haemosphere=="Ery" | so$label_fine_haemosphere %in% c("preCFUE", "CFUE", "pbEry", "poEry", "Retic") | so$pHb_RNA >= 5, 
            "fail", so$qc_class
        )
        
    }
    
    so_tmp <- subset(so, subset=qc_class=="pass")
    
    so$nFeature_RNA_max <- max(so_tmp$nFeature_RNA)
    so$nFeature_RNA_min <- quantile(so_tmp$nFeature_RNA, 0.01)
    
    so$qc_class <- ifelse(
        so$qc_class == "pass" &
        so$nFeature_RNA <= so$nFeature_RNA_max & 
        so$nFeature_RNA > so$nFeature_RNA_min, 
        "pass", "fail"
      )
    
    return(so)

}
In [21]:
so_qc <- Seurat::SplitObject(so_qc, split.by="sample_name")
so_qc <- lapply(so_qc, qc_class_set)
so_qc <- merge(so_qc[[1]], so_qc[2:length(so_qc)])

Save raw data with filter label for further investigation¶

In [22]:
saveRDS(so_qc, "data/object/raw_filter_label.rds")

QC plots¶

Density plot¶

In [23]:
density_plot_qc_1 <- density_plot_qc(so=so_qc, title="Density plot UMI count", x=nCount_RNA, xlab="log10(UMI count)", min=nCount_RNA_min, max=nCount_RNA_max, fill_color=color$tissue)
density_plot_qc_2 <- density_plot_qc(so=so_qc, title="Density plot Feature count", x=nFeature_RNA, xlab="log10(Feature count)", min=nFeature_RNA_min, max=nFeature_RNA_max, fill_color=color$tissue)
density_plot_qc_3 <- density_plot_qc(so=so_qc, title="Density plot Mt %", x=pMt_RNA, xlab="Mt [%]", min=0, max=pMt_RNA_max, log10=FALSE, fill_color=color$tissue)
In [24]:
options(repr.plot.width=22.5, repr.plot.height=5)
density_plot_qc_1 + density_plot_qc_2 + density_plot_qc_3 + plot_layout(nrow=1) & theme(legend.position="bottom")

Scattern plot¶

In [25]:
scattern_plot_qc_1 <- scattern_plot_qc(so=so_qc, title="Mitochondrial gene percentage", color=pMt_RNA)
scattern_plot_qc_2 <- scattern_plot_qc(so=so_qc, title="Hemoglobin gene percentage", color=pHb_RNA)
scattern_plot_qc_3 <- scattern_plot_qc(so=so_qc, title="Ribsonmal gene percentage", color=pRb_RNA)
scattern_plot_qc_4 <- scattern_plot_qc(so=so_qc, title="Fine labels", color=label_fine_haemosphere) + scale_color_manual(values=color$label_fine_haemosphere)
scattern_plot_qc_5 <- scattern_plot_qc(so=so_qc, title="Doublet class", color=doublet_class) + scale_color_manual(values=c("doublet"="#132B43", "singlet"="#56B1F7"))
scattern_plot_qc_6 <- scattern_plot_qc(so=so_qc, title="QC class", color=qc_class) + scale_color_manual(values=c("fail"="#132B43", "pass"="#56B1F7"))
In [26]:
options(repr.plot.width=22.5, repr.plot.height=10)
scattern_plot_qc_1 + scattern_plot_qc_2 + scattern_plot_qc_3 + scattern_plot_qc_4 + scattern_plot_qc_5 + scattern_plot_qc_6 + plot_layout(ncol=3) & theme(legend.position="bottom")

Box plots¶

In [27]:
box_plot_qc_1 <- box_plot_qc(so=so_qc, y=nCount_RNA, fill=tissue, ylab="UMI [count]", ymin=0)
box_plot_qc_2 <- box_plot_qc(so=so_qc, y=nFeature_RNA, fill=tissue, ylab="Feature [count]", ymin=0)
box_plot_qc_3 <- box_plot_qc(so=so_qc, y=pMt_RNA, fill=tissue, ylab="Mt [%]", ymin=0, ymax=100)
box_plot_qc_4 <- box_plot_qc(so=so_qc, y=pHb_RNA, fill=tissue, ylab="Hb [%]", ymin=0, ymax=100)
box_plot_qc_5 <- box_plot_qc(so=so_qc, y=pRb_RNA, fill=tissue, ylab="Rbl [%]", ymin=0, ymax=100)
In [28]:
options(repr.plot.width=22.5, repr.plot.height=20)
box_plot_qc_1[[1]] + box_plot_qc_1[[2]] + box_plot_qc_2[[1]]  + box_plot_qc_2[[2]] + 
box_plot_qc_3[[1]] + box_plot_qc_3[[2]] + box_plot_qc_4[[1]]  + box_plot_qc_4[[2]] + 
box_plot_qc_5[[1]] + box_plot_qc_5[[2]] + plot_spacer() + plot_spacer() + plot_layout(guides="collect", ncol=2) & theme(legend.position="none")

Filter by QC class¶

In [29]:
so_qc <- subset(so_qc, subset=qc_class=="pass")

UMAP and clustering of all samples on all genes¶

In [30]:
so_qc <- NormalizeData(so_qc, assay="RNA")
so_qc <- ScaleData(so_qc, assay="RNA", verbose=FALSE)
so_qc <- RunPCA(so_qc, npcs=50, assay="RNA", features=rownames(so_qc), verbose=FALSE)
so_qc <- FindNeighbors(so_qc, dims=1:30, k.param=20, verbose=FALSE)
so_qc <- FindClusters(so_qc, verbose=FALSE, resolution=1, algorithm=1, group.singletons=FALSE)
so_qc <- RunUMAP(so_qc, dims=1:50, n.neighbors=30, verbose=FALSE)
In [31]:
options(repr.plot.width=22.5, repr.plot.height=15)

dplot_1 <- dplot(so_qc, reduction="umap", group_by="seurat_clusters", label=TRUE) + ggtitle("Louvain cluster")
dplot_2 <- dplot(so_qc, reduction="umap", group_by="tissue") + scale_color_manual(values=color$tissue) + ggtitle("Tissue")
dplot_3 <- dplot(so_qc, reduction="umap", group_by="treatment") + scale_color_manual(values=color$treatment) + ggtitle("Treatment")
dplot_4 <- dplot(so_qc, reduction="umap", group_by="cc_phase_class") + scale_color_manual(values=color$cc_phase_class) + ggtitle("Cell cycle")
dplot_5 <- dplot(so_qc, reduction="umap", group_by="label_main_haemosphere") + scale_color_manual(values=color$label_main_haemosphere) + ggtitle("Haemosphere label (main)")
dplot_6 <- dplot(so_qc, reduction="umap", group_by="label_fine_haemosphere") + scale_color_manual(values=color$label_fine_haemosphere) + ggtitle("Haemosphere label (fine)")
dplot_7 <- dplot(so_qc, reduction="umap", group_by="sample_rep") + ggtitle("Replicate")
fplot_1 <- fplot(so_qc, reduction="umap", features="Trac") + scale_colour_viridis_c(option="inferno") + ggtitle("Trac")
fplot_2 <- fplot(so_qc, reduction="umap", features="Igkc") + scale_colour_viridis_c(option="inferno") + ggtitle("Igkc")

dplot_comb <- dplot_1 + dplot_2 + dplot_3 + dplot_4 + dplot_5 + dplot_6 + dplot_7 + fplot_1 + fplot_2 + plot_layout(ncol=3)
dplot_comb
Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

Scale for 'colour' is already present. Adding another scale for 'colour',
which will replace the existing scale.

In [32]:
so_qc@meta.data %>% dplyr::group_by(seurat_clusters, treatment, sample_rep) %>% 
    dplyr::summarise(n=n()) %>% data.frame() %>% 
    tidyr::spread(seurat_clusters, n) %>% 
    kableExtra::kable("html") %>% as.character() %>% IRdisplay::display_html()
`summarise()` has grouped output by 'seurat_clusters', 'treatment'. You can
override using the `.groups` argument.
treatment sample_rep 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
CpG Rep1 41 46 96 922 1020 213 41 159 162 425 332 7 22 511 130 139 75 180 105 64 68 50 13 4 5 NA
CpG Rep2 1455 105 105 105 33 837 823 380 243 11 337 3 604 153 310 251 109 10 58 115 4 10 2 3 1 1
NaCl Rep1 3 37 347 259 100 5 18 119 347 422 39 434 2 15 68 71 114 250 124 51 92 30 21 7 9 30
NaCl Rep2 266 1411 777 30 16 79 100 259 153 34 32 279 83 1 122 72 217 7 61 45 16 46 67 44 25 2
In [33]:
marker_22 <- FindMarkers(so_qc, ident.1="22", logfc.threshold=0.5, min.pct=0.01, only.pos=FALSE)
marker_23 <- FindMarkers(so_qc, ident.1="23", logfc.threshold=0.5, min.pct=0.01, only.pos=FALSE)
marker_25 <- FindMarkers(so_qc, ident.1="25", logfc.threshold=0.5, min.pct=0.01, only.pos=FALSE)
In [34]:
options(repr.plot.width=15, repr.plot.height=5)

cluster_marker <- list(marker_22, marker_23, marker_25)

vp_cluster_marker <- lapply(seq_along(cluster_marker), function(i) vp_dea(cluster_marker[[i]], title=c("Cluster 22", "Cluster 23", "Cluster 25")[i], log2_thold=0.25))
wrap_plots(vp_cluster_marker)

Remove T cell (Trac) and Plasma cell (Igkc) cluster¶

In [35]:
so_qc <- subset(so_qc, subset=seurat_clusters!=22)
so_qc <- subset(so_qc, subset=seurat_clusters!=23)
so_qc <- subset(so_qc, subset=seurat_clusters!=25)

Save result objso_qc_rds¶

In [36]:
# Store data as RDS 
saveRDS(so_qc, so_qc_rds)
In [37]:
# Store data as h5ad 
adata <- import("anndata", as="adata", convert=FALSE)
pd <- import("pandas", as="pd", convert=FALSE)
np <- import("numpy", as="np", convert=FALSE)

    
# Transform dgCMatrix to 
X <- GetAssayData(so_qc, assay="RNA", slot="counts") %>% as.matrix() %>% t()
X <- np$array(X, dtype=np$int32)
    
adata <- adata$AnnData(X=X, obs=so_qc@meta.data)
adata$var_names <- rownames(GetAssayData(so_qc, assay="RNA", slot="counts"))

adata$raw <- adata
adata$write_h5ad(so_qc_h5ad)
None

Session info¶

In [38]:
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.5 (Ootpa)

Matrix products: default
BLAS/LAPACK: /home/fdeckert/bin/miniconda3/envs/r.4.1.0-FD20200109SPLENO/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggrepel_0.9.1        patchwork_1.1.1      RColorBrewer_1.1-3  
 [4] IRdisplay_1.1        kableExtra_1.3.4     knitr_1.39          
 [7] biomaRt_2.50.3       reticulate_1.24      forcats_0.5.1       
[10] stringr_1.4.0        dplyr_1.0.9          purrr_0.3.4         
[13] readr_2.1.2          tidyr_1.2.0          tibble_3.1.7        
[16] ggplot2_3.3.6        tidyverse_1.3.1      SeuratWrappers_0.3.0
[19] scDblFinder_1.8.0    sp_1.4-7             SeuratObject_4.1.0  
[22] Seurat_4.1.1        

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              pbdZMQ_0.3-7               
  [3] scattermore_0.8             R.methodsS3_1.8.1          
  [5] bit64_4.0.5                 irlba_2.3.5                
  [7] DelayedArray_0.20.0         R.utils_2.11.0             
  [9] data.table_1.14.2           rpart_4.1.16               
 [11] KEGGREST_1.34.0             RCurl_1.98-1.6             
 [13] generics_0.1.2              BiocGenerics_0.40.0        
 [15] ScaledMatrix_1.2.0          cowplot_1.1.1              
 [17] RSQLite_2.2.13              RANN_2.6.1                 
 [19] future_1.25.0               bit_4.0.4                  
 [21] tzdb_0.3.0                  spatstat.data_2.2-0        
 [23] webshot_0.5.3               xml2_1.3.3                 
 [25] lubridate_1.8.0             httpuv_1.6.5               
 [27] SummarizedExperiment_1.24.0 assertthat_0.2.1           
 [29] viridis_0.6.2               xfun_0.30                  
 [31] hms_1.1.1                   evaluate_0.15              
 [33] promises_1.2.0.1            fansi_1.0.3                
 [35] progress_1.2.2              dbplyr_2.1.1               
 [37] readxl_1.4.0                igraph_1.3.1               
 [39] DBI_1.1.2                   htmlwidgets_1.5.4          
 [41] spatstat.geom_2.4-0         stats4_4.1.0               
 [43] ellipsis_0.3.2              RSpectra_0.16-1            
 [45] backports_1.4.1             deldir_1.0-6               
 [47] sparseMatrixStats_1.6.0     MatrixGenerics_1.6.0       
 [49] vctrs_0.4.1                 SingleCellExperiment_1.16.0
 [51] Biobase_2.54.0              here_1.0.1                 
 [53] Cairo_1.5-15                remotes_2.4.2              
 [55] ROCR_1.0-11                 abind_1.4-5                
 [57] cachem_1.0.6                withr_2.5.0                
 [59] progressr_0.10.0            sctransform_0.3.3          
 [61] prettyunits_1.1.1           scran_1.22.1               
 [63] goftest_1.2-3               svglite_2.1.0              
 [65] cluster_2.1.3               lazyeval_0.2.2             
 [67] crayon_1.5.1                labeling_0.4.2             
 [69] edgeR_3.36.0                pkgconfig_2.0.3            
 [71] GenomeInfoDb_1.30.1         nlme_3.1-157               
 [73] vipor_0.4.5                 rlang_1.0.6                
 [75] globals_0.14.0              lifecycle_1.0.3            
 [77] miniUI_0.1.1.1              filelock_1.0.2             
 [79] BiocFileCache_2.2.1         modelr_0.1.8               
 [81] rsvd_1.0.5                  rprojroot_2.0.3            
 [83] cellranger_1.1.0            polyclip_1.10-0            
 [85] matrixStats_0.62.0          lmtest_0.9-40              
 [87] Matrix_1.4-1                IRkernel_1.3               
 [89] zoo_1.8-10                  reprex_2.0.1               
 [91] base64enc_0.1-3             beeswarm_0.4.0             
 [93] ggridges_0.5.3              png_0.1-7                  
 [95] viridisLite_0.4.0           bitops_1.0-7               
 [97] R.oo_1.24.0                 KernSmooth_2.23-20         
 [99] Biostrings_2.62.0           blob_1.2.3                 
[101] DelayedMatrixStats_1.16.0   parallelly_1.31.1          
[103] spatstat.random_2.2-0       S4Vectors_0.32.4           
[105] beachmat_2.10.0             scales_1.2.0               
[107] memoise_2.0.1               magrittr_2.0.3             
[109] plyr_1.8.7                  ica_1.0-2                  
[111] zlibbioc_1.40.0             compiler_4.1.0             
[113] dqrng_0.3.0                 fitdistrplus_1.1-8         
[115] cli_3.4.1                   XVector_0.34.0             
[117] listenv_0.8.0               pbapply_1.5-0              
[119] MASS_7.3-57                 mgcv_1.8-40                
[121] tidyselect_1.1.2            stringi_1.7.6              
[123] highr_0.9                   BiocSingular_1.10.0        
[125] locfit_1.5-9.5              grid_4.1.0                 
[127] tools_4.1.0                 future.apply_1.9.0         
[129] parallel_4.1.0              rstudioapi_0.13            
[131] uuid_1.1-0                  bluster_1.4.0              
[133] metapod_1.2.0               gridExtra_2.3              
[135] farver_2.1.0                Rtsne_0.16                 
[137] digest_0.6.29               BiocManager_1.30.17        
[139] rgeos_0.5-9                 shiny_1.7.1                
[141] Rcpp_1.0.8.3                GenomicRanges_1.46.1       
[143] broom_0.8.0                 scuttle_1.4.0              
[145] later_1.3.0                 RcppAnnoy_0.0.19           
[147] httr_1.4.3                  AnnotationDbi_1.56.2       
[149] colorspace_2.0-3            rvest_1.0.2                
[151] XML_3.99-0.9                fs_1.5.2                   
[153] tensor_1.5                  IRanges_2.28.0             
[155] splines_4.1.0               uwot_0.1.11                
[157] statmod_1.4.36              spatstat.utils_2.3-0       
[159] scater_1.22.0               xgboost_1.6.0.1            
[161] plotly_4.10.0               systemfonts_1.0.4          
[163] xtable_1.8-4                jsonlite_1.8.0             
[165] R6_2.5.1                    pillar_1.8.1               
[167] htmltools_0.5.2             mime_0.12                  
[169] glue_1.6.2                  fastmap_1.1.0              
[171] BiocParallel_1.28.3         BiocNeighbors_1.12.0       
[173] codetools_0.2-18            utf8_1.2.2                 
[175] lattice_0.20-45             spatstat.sparse_2.1-1      
[177] curl_4.3.2                  ggbeeswarm_0.6.0           
[179] leiden_0.3.10               survival_3.3-1             
[181] limma_3.50.3                rmarkdown_2.14             
[183] repr_1.1.4                  munsell_0.5.0              
[185] GenomeInfoDbData_1.2.7      haven_2.5.0                
[187] reshape2_1.4.4              gtable_0.3.0               
[189] spatstat.core_2.4-2